Identification of Therapeutic Agents Using Genetic Fingerprinting

ABSTRACT

Methods for identifying compounds having similar biological activity based on similarity in genetic expression profile in cells contacted with one or more of said compounds, along with specifically identified compounds, are disclosed.

This application claims priority of U.S. Provisional Applications Ser.No. 60/480,013, filed 20 Jun. 2003, and 60/517,369, filed 5 Nov. 2003,the disclosures of both of which are hereby incorporated by reference intheir entirety.

FIELD OF THE INVENTION

The present invention relates to biologically active compounds andmethods of identifying biologically active compounds based on theactivity of such compounds in modulating the expression of a set ofgenes determined to be modulated by a plurality of compounds exhibitinga common biological activity.

BACKGROUND OF THE INVENTION

Many different agents are known to possess biological activity,including therapeutic activity, and for many of these the molecularmechanism of action is known. Thus, such compounds may be determined tobe related to each other in that they have a common mechanism of action,which mechanism may bear some relationship to the chemical properties ofthe compounds or to their overall molecular shape. Alternatively, suchcompounds may not be similar in overall molecular shape or propertiesbut may still, for diverse reasons, operate biologically in a similarmanner. In addition, such compounds, related by mechanism of action(MOA) may also show other properties in common and thus theseMOA-related sets of compounds may be formed into distinct groups basedon their common biological activity. It would be advantageous to be ableto take advantage of this relationship based on common MOA by devisingscreening assays for other compounds having similar biological activity.Because methods of analyzing gene expression are subject to use in largescreening assays, where such methods, including rapid measurement ofmessenger RNA species coupled with methods of reversetranscriptase-polymerase chain reaction amplification for ease ofmeasurement, are susceptible to high degrees of automation, such geneticmethods present themselves as a ready medium for high throughputscreening for agents having a selected biological activity. The presentinvention takes advantage of such methods to provide high throughputscreening assays (HTSA) capable of rapidly identifying agents havingtherapeutic activity.

BRIEF SUMMARY OF THE INVENTION

In one aspect, the present invention relates to a method for identifyinga compound having selected biological activity comprising:

(a) contacting a cell with each of a plurality of compounds exhibitingsimilar biological activity and determining a change in the expressionof a plurality of genes of said cell as a result of said contactingwhereby the relative changes in expression of said genes together formsa gene expression profile;

(b) contacting a compound different from that of (a) with a cellcontaining said determined genes of (a) and determining a change inexpression of said determined genes as a result of said contactingwhereby the relative changes in expression of said determined genestogether forms the gene expression profile of (a) thereby identifying abiologically active compound.

In another aspect, the present invention relates to a method foridentifying a compound with a selected activity, comprising:

(a) determining a change in the expression profile of a selected set ofgenes in the presence and absence of a first compound having a desiredor selected activity,

(b) determining a change in the expression profile of the selected setof genes of step (a) in the presence and absence of a second compound,

(c) comparing said determined change in expression profile in step (b)with that in step (a)

wherein a determination in step (c) of the same or similar change insaid expression profile identifies said second compound as a compoundhaving said selected activity.

In an additional preferred embodiment, the cell of (a) is a colon cell,such as a cancer cell of such organ or tissue.

In one preferred embodiment, the compound of (b) is not an agentpossessing known biologically activity so that the methods of theinvention find their greatest use in identifying novel agents with aselected biological activity.

The present invention also relates to related gene sets, including thosewhose polynucleotide sequences correspond to the sequences of SEQ ID NO:1-12, which gene set is useful in the methods of the invention.

The present invention further relates to compound identified as havingbiological activity by the methods of the invention. In preferredembodiments, such identified compounds have therapeutic activity, and/oranti-neoplastic activity, and/or enzyme inhibitory, as first determinedby the methods disclosed herein.

The present invention also relates to a method for treating a diseasecomprising administering to an animal afflicted with said disease of atherapeutically effective amount of a compound identified by the methodsof the invention as having therapeutic activity. In a preferredembodiment, said therapeutic activity is anti-neoplastic activity.

DEFINITIONS

As used herein and except as noted otherwise, all terms are defined asgiven below.

In accordance with the present invention, the term “DNA segment” or “DNAsequence” refers to a DNA polymer, in the form of a separate fragment oras a component of a larger DNA construct, which has been derived fromDNA isolated at least once in substantially pure form, i.e., free ofcontaminating endogenous materials and in a quantity or concentrationenabling identification, manipulation, and recovery of the segment andits component nucleotide sequences by standard biochemical methods, forexample, using a cloning vector. Such segments are provided in the formof an open reading frame uninterrupted by internal non-translatedsequences, or introns, which are typically present in eukaryotic genes.Sequences of non-translated DNA may be present downstream from the openreading frame, where the same do not interfere with manipulation orexpression of the coding regions.

The term “coding region” refers to that portion of a gene which eithernaturally or normally codes for the expression product of that gene inits natural genomic environment, i.e., the region coding in vivo for thenative expression product of the gene. The coding region can be from anormal, mutated or altered gene, or can even be from a DNA sequence, orgene, wholly synthesized in the laboratory using methods well known tothose of skill in the art of DNA synthesis.

In accordance with the present invention, the term “nucleotide sequence”refers to a heteropolymer of deoxyribonucleotides. Generally, DNAsegments encoding the proteins provided by this invention are assembledfrom cDNA fragments and short oligonucleotide linkers, or from a seriesof oligonucleotides, to provide a synthetic gene which is capable ofbeing expressed in a recombinant transcriptional unit comprisingregulatory elements derived from a microbial or viral operon.

The term “expression product” means that polypeptide or protein that isthe natural translation product of the gene and any nucleic acidsequence coding equivalents resulting from genetic code degeneracy andthus coding for the same amino acid(s).

The term “fragment,” when referring to a coding sequence, means aportion of DNA comprising less than the complete coding region whoseexpression product retains essentially the same biological function oractivity as the expression product of the complete coding region.

The term “primer” means a short nucleic acid sequence that is pairedwith one strand of DNA and provides a free 3′-OH end at which a DNApolymerase starts synthesis of a deoxyribonucleotide chain.

The term “promoter” means a region of DNA involved in binding of RNApolymerase to initiate transcription. The term “enhancer” refers to aregion of DNA that, when present and active, has the effect ofincreasing expression of a different DNA sequence that is beingexpressed, thereby increasing the amount of expression product formedfrom said different DNA sequence.

The term “open reading frame (ORF)” means a series of triplets codingfor amino acids without any termination codons and is a sequence(potentially) translatable into protein.

As used herein, reference to a DNA sequence includes both singlestranded and double stranded DNA. Thus, the specific sequence, unlessthe context indicates otherwise, refers to the single strand DNA of suchsequence, the duplex of such sequence with its complement (doublestranded DNA) and the complement of such sequence.

The term “percent identity” or “percent identical,” when referring to asequence, means that a sequence is compared to a claimed or describedsequence after alignment of the sequence to be compared (the “ComparedSequence”) with the described or claimed sequence (the “ReferenceSequence”). The Percent Identity is then determined according to thefollowing formula:Percent Identity=100[1−(C/R)]wherein C is the number of differences between the Reference Sequenceand the Compared Sequence over the length of alignment between theReference Sequence and the Compared Sequence wherein (i) each base oramino acid in the Reference Sequence that does not have a correspondingaligned base or amino acid in the Compared Sequence and (ii) each gap inthe Reference Sequence and (iii) each aligned base or amino acid in theReference Sequence that is different from an aligned base or amino acidin the Compared Sequence, constitutes a difference; and R is the numberof bases or amino acids in the Reference Sequence over the length of thealignment with the Compared

Sequence with any gap created in the Reference Sequence also beingcounted as a base or amino acid.

If an alignment exists between the Compared Sequence and the ReferenceSequence for which the percent identity as calculated above is aboutequal to or greater than a specified minimum Percent Identity then theCompared Sequence has the specified minimum percent identity to theReference Sequence even though alignments may exist in which thehereinabove calculated Percent Identity is less than the specifiedPercent Identity.

As used herein, the terms “portion,” “segment,” and “fragment,” whenused in relation to polypeptides, refer to a continuous sequence ofresidues, such as amino acid residues, which sequence forms a subset ofa larger sequence. For example, if a polypeptide were subjected totreatment with any of the common endopeptidases, such as trypsin orchymotrypsin, the oligopeptides resulting from such treatment wouldrepresent portions, segments or fragments of the starting polypeptide.When used in relation to a polynucleotides, such terms refer to theproducts produced by treatment of said polynucleotides with any of thecommon endonucleases, or any stretch of polynucleotides that could besynthetically synthesized.

The term “correspond” means that the gene has the indicated nucleotidesequence or that it encodes substantially the same RNA as would beencoded by the indicated sequence, the term “substantially” meaningabout at least 90% identical as defined elsewhere herein and includessplice variants thereof.

The term “corresponding genes” refers to genes that encode an RNA thatis at least 90% identical, preferably at least 95% identical, mostpreferably at least 98% identical, and especially identical, to an RNAencoded by one of the nucleotide sequences disclosed herein (i.e., SEQID NO: 1-12). Such genes will also encode the same polypeptide sequenceas any of the sequences disclosed herein, preferably SEQ ID NO: 1-12,but may include differences in such amino acid sequences where suchdifferences are limited to conservative amino acid substitutions, suchas where the same overall three dimensional structure, and thus the sameantigenic character, is maintained. Thus, amino acid sequences may bewithin the scope of the present invention where they react with the sameantibodies that react with polypeptides comprising the sequences of SEQID NO: 1-12 as disclosed herein.

The term “related gene set” refers to a set of genes, perhaps 5, 10 ormore genes, such as those corresponding to the sequences disclosedherein, whose pattern of expression in a cell, expression is modulatedby a given set of biologically active agents, especially where saidagents exert said activity by a common molecular mechanism.

As used herein, the terms “gene expression profile” or “gene expressionfingerprint” are interchangeable and refer to the pattern of geneexpression modulation, including increase or decrease of expression,exhibited by an the members of a set of chemical agents with establishedbiological activity when determined using a related gene set. Thus, fora set of 10 genes, possibly genes 1-6 are reduced in expression andgenes 7-10 are increased in expression after contact with each of a setof agents having common biological activity. These genes represent arelated gene set. The profile or fingerprint will include the relativedegree of increase or decrease of expression of the genes of the set inresponse to the presence of a given concentration of an establishedbiologically active agent (for example, expression of gene 1 may bereduced by half, gene 2 by ⅔, gene 3 not expressed at all, gene 7doubled in expression, gene 10 increased 3 fold in expression, and so onin response to each of the compounds of the set and relative to thesteady state levels of said genes). In the typical case, compound A isintroduced into the growth medium of the cells. The result is a geneexpression profile, or gene expression fingerprint, or expressionfingerprint, for compound A and other compounds of the set possessingcommon biological activity.

As used herein, the term “compound classifier” refers to a profile oftranscriptional changes across a specific set of 10-40 genes that areinduced in cells by multiple chemotypes with similar mechanisms ofaction or biologic function. A compound classifier simply seeks todefine unique transcriptional profile associated with a group of relatedcompounds.

As used herein, the term “compound profile” can be generated by thecombination of compound classifiers from a diversified referencecompound collection. The compound profiler provides a global comparisonof compound treatments to a reference compound database. Compoundprofilers can be effectively used to define and predict a variety ofproperties of interest.

DETAILED SUMMARY OF THE INVENTION

In accordance with the present invention, model cellular systems usingcell lines, primary cells, or tissue samples are maintained in growthmedium and may be treated with compounds at a single concentration or ata range of concentrations. At specific times after treatment, cellularRNAs are isolated from the treated cells, primary cells or tissues,which RNAs are indicative of expression of the different genes. Thecellular RNA is then divided and subjected to analysis that detects thepresence and/or quantity of specific RNA transcripts, which transcriptsmay then be amplified for detection purposes using standardmethodologies, such as, for example, reverse transcriptase polymerasechain reaction (RT-PCR), etc. The presence or absence, or levels, ofspecific RNA transcripts are determined from these measurements and ametric derived for the type and degree of response of the sample versusthe steady state levels of such transcripts when the compound is notpresent. The relative levels of RNA transcripts following saidcontacting with each of a set of agents having established biologicalactivity, including therapeutic activity, such as anti-neoplasticactivity, and/or enzyme inhibitory activity and the like serves todefine a related gene set and the expression profile of this setprovides the fingerprint for the established biologically active agent.

Also in accordance with the present invention, there are disclosed a setof genes and gene sequences whose expression is, or can be, as a resultof the methods of the present invention, used to define a related geneset. Thus, the methods of the present invention identify noveltherapeutic, including anti-neoplastic, agents based on their exhibitingthe same fingerprint as an established biologically active agent (anddisclosed herein in specific model systems, such as the HT29 coloncancer cell line). The methods of the invention may be used with avariety of cell lines or with primary samples from tissues maintained invitro under suitable culture conditions for varying periods of time, orin situ in suitable animal models.

The present invention also provides screening assays for identifyingbiologically active agents, whether the underlying chemical structuresare novel or otherwise, based on the action of such agents to modulatesuch gene sets in a manner similar to that of an establishedbiologically active agent.

In one highly specific embodiment of the present invention, anestablished biologically active agent, such as an agent found to inhibitthe growth or metastasis of, or kill, cancerous cells, is used toidentify a set of cancer related genes by determining the genes presentin a cancerous cell whose expression is modulated when said cell iscontacted with each of a set of agents having established biologicalactivity, including therapeutic activity, such as anti-neoplasticactivity, and/or enzyme inhibitory activity and the like. Thus, as aresult of such contacting, genes whose expression changed versus whensaid contacting does not occur (i.e., the steady state levels of suchgene expression), are found to show increased expression may then begrouped as a cancer related gene set.

In a highly specific but non-limiting example, where said biologicalactivity is anti-neoplastic activity, an established anti-neoplasticagent, compound A, is determined to modulate the expression of 10 genesfound in a colon cancer cell, such as an adenocarcinoma, whereby genesthese genes show a varying pattern of expression following contacting ofthe cell with compound A. For example, genes 1 to 7 show reducedexpression, or non-expression, while genes 8 to 10 show expression, orincreased expression, as a result of said contacting. This set of 10genes thus represents a cancer related gene set as defined herein. Eachof said 10 genes may be modulated to a different extent by saidestablished anti-neoplastic agent. For example, expression of gene 1 maybe reduced to a level where expression is no longer detected while gene2 is reduced to half its expression when compound A is not present. Therelative levels of expression of each of the genes in the presence andabsence of compound A serves to establish an expression pattern, whichthen represent a mechanism of action of compound A, or is related to, orindicative of, a mechanism of action of compound A. In accordance withthe methods disclosed herein, cancer cells (for example, colon cancercells) containing these genes are subsequently contacted, for example,in vitro, with agents whose anti-neoplastic activity is to be determinedand the expression pattern of the genes of this cancer related gene set(defined by expression pattern produced by compound A) following saidcontacting is determined. As a result of such screening, an additionalagent, compound B, is found to modulate this same set of genes, and bythe same relative amounts (i.e., the same expression pattern results).Thus, compound B is deemed to be an anti-neoplastic agent and compoundsA and B are deemed to act by the same, or similar, mechanism. In apreferred embodiment, the compounds to be screened will be compoundshaving structural similarity to compound A.

In an alternative example of the foregoing, the related gene set (here,a cancer related gene set) may be independently determined to beinvolved in the cancerous state without recourse to the modulatingability of any known anti-neoplastic agent. Such cancer related gene setis then utilized as the basis for screening for anti-neoplastic agentsde novo, thereby resulting in the identification of Compound A. In onesuch embodiment, the genes modulated by compound A may represent asubset of the cancer related gene set. The structure of compound A isthen utilized as a basis for testing other compounds for anti-neoplasticactivity where such other compounds to be tested are of similar chemicalstructure to compound A (in the same manner as described above).

As disclosed herein, a set of genes is identified that are expressed atvarying levels in a cell in response to contact with each of a set ofcompounds exhibiting a common biological activity, or possessing asimilar mechanism of action, including one unrelated to the modulationof said gene set, and said gene set forms a related gene set. Suchrelated gene sets are deemed “fingerprints” for identifying additionalagents with a selected biological activity by their ability to modulatesuch gene sets, and in the same relative amounts, as agents exhibitingsaid selected biological activity. Thus, the relative modulation of thesame gene set acts as a “fingerprint” for other biologically activeagents. In accordance with the present invention, such selectedbiological activity may include therapeutic activity, such asanti-neoplastic activity, and/or enzyme inhibitory activity and thelike.

In one embodiment, the present invention related to a method foridentifying a compound having selected biological activity comprising:

(a) contacting a cell with each of a plurality of compounds exhibitingsimilar biological activity and determining a change in the expressionof a plurality of genes of said cell as a result of said contactingwhereby the relative changes in expression of said genes together formsa gene expression profile;

(b) contacting a compound different from that of (a) with a cellcontaining said determined genes of (a) and determining a change inexpression of said determined genes as a result of said contactingwhereby the relative changes in expression of said determined genestogether forms the gene expression profile of (a) thereby identifying abiologically active compound.

In another aspect, the present invention relates to a method foridentifying a compound with a selected activity, comprising:

(a) determining a change in the expression profile of a selected set ofgenes in the presence and absence of a first compound having a desiredor selected activity,

(b) determining a change in the expression profile of the selected setof genes of step (a) in the presence and absence of a second compound,preferably wherein the second compound is a test compound whose activityis to be determined or compared with that of the first compound,

(c) comparing said determined change in expression profile in step (b)with that in step (a)

wherein a determination in step (c) of the same or similar change insaid expression profile identifies said second compound as a compoundhaving said selected activity.

In preferred embodiments, the biological activity may includetherapeutic activity, enzyme inhibitory activity, and/or anti-neoplasticactivity. In other preferred embodiments, the compounds useful in step(a) (i.e., as a first compound) comprise one or more topoisomerase IIinhibitors, especially one or more selected from Camptothecine (S, +),beta-Lapachone, Suramin sodium salt, Aclacinomycin A from Streptomycesgalilaeus, Mitoxantrone dihydrochloride, Etoposide, Doxorubicinhydrochloride, Aurintricarboxylic acid, Epirubicin hydrochloride, andm-AMSA hydrochloride.

In and additional preferred embodiment, the cell of (a) is a colon cell,such as a cancer cell of such organ or tissue. The cells utilized in themethods of the invention may also be recombinant cells engineered toexpress the determined genes, such as one or more genes of a related, orother selected, gene set, including where the recombinant cell does notexpress the determined genes absent being engineered to do so, such asby genetic engineering.

In one preferred embodiment, the compound of (b) is not an agentpossessing known biologically activity so that the methods of theinvention find their greatest use in identifying novel agents with aselected biological activity.

The present invention also relates to related gene sets, including thosewhose polynucleotide sequences correspond to the sequences of SEQ ID NO:1-12, which gene set is useful in the methods of the invention. Thesegenes have sequences known in the literature and are summarized in Table1 with reference to their GenBank Accession numbers. Descriptions of thegenes are provided in Table 2.

Thus, as a general example, the present invention comprises a method fordetermining whether a compound functions through a known mechanism ofaction, comprising:

(a) contacting said compound with a defined cell line;

(b) determining the expression pattern of a defined number of genes ofsaid cell line; and

(c) comparing said expression patterns of (b) with the expressionpattern of said defined number of genes of said cell line with at leastone reference compound that functions through a known mechanism based onthe similarity of the gene expression of said compound and said at leastone reference compound. TABLE 1 SEQ ID NO. probeId accession locusLinkIdunigeneId geneName genMappId 1 OG1505 AA634799 26298 Hs. 182339 EHFU54617 2 OG1127 NM_006017 8842 Hs. 112360 PROM1 AF027208 3 OG798NM_014312 23584 Hs. 112377 CTXL AI799005 4 OG812 NM_016377 9465 Hs.12835 AKAP7 AF047715 5 OG838 NM_024320 79170 Hs. 36529 MGC11242NM_024320 6 OG477 NM_032192 84152 Hs. 286192 PPP1R1B AK024593 7 OG1321XM_006697 54997 Hs. 18791 TSC AA883422 8 OG1234 XM_017384 4316 Hs. 2256MMP7 L22524 9 OG892 XM_030447 6319 Hs. 119597 SCD AF097514 10 OG252XM_032759 6678 Hs. 111779 SPARC J03040 11 OG922 XM_043412 1026 Hs.179665 CDKN1A L25610 12 OG1551 XM_047592 30061 Hs. 5944 SLC11A3 AF226614

TABLE 2 SEQ ID ref Seq NO. gene Description av Category Acc 1 etshomologous factor Colon differential 2 prominin 1 Up in EpithelialNM_006017 3 cortical thymocyte Colon differential; TSA NM_014312receptor (X. laevis regulated CTX) like 4 A kinase (PRKA) anchor plasmamembrane NM_016377 protein 7 5 hypothetical protein W95024 MGC11242 6protein phosphatase 1, Up in Epithelial NM_032192 regulatory (inhibitor)subunit 1B (dopamine and cAMP regulated phospho- protein, DARPP-32) 7hypothetical protein Colon differential_MMC XM_006697 FLJ20607 8 matrixmetallopro- Colon differential; TSA XM_017384 teinase 7 (matrilysin,regulated uterine) 9 stearoyl-CoA desaturase TOX genes down in toxXM_030447 (delta-9-desaturase) 10 secreted protein, acidic, Breast upXM_032759 cysteine-rich (osteonectin) 11 cyclin-dependent kinase cellcycle; XM_043412 inhibitor 1A (p21, Cip1) Oncogenes/TSGs 12 solutecarrier family 11 plasma membrane XM_047592 (proton-coupled divalentmetal ion transporters), member 3

The nucleotide and amino acid sequences deposited in GenBank, along withancillary information included therewith, under the accession numbersidentified in Tables 1 and 2, are hereby incorporated by reference intheir entirety.

The present invention further relates to compounds identified as havingbiological activity by the methods of the invention. In preferredembodiments, such identified compounds have therapeutic activity, and/oranti-neoplastic activity, and/or enzyme inhibitory, as first determinedby the methods disclosed herein.

The present invention also relates to a method for treating a diseasecomprising administering to an animal afflicted with said disease of atherapeutically effective amount of a compound identified by the methodsof the invention as having therapeutic activity. In a preferredembodiment, said therapeutic activity is anti-neoplastic activity.

The present invention further relates to a method for identifying arelated gene set, as defined herein, comprising:

contacting a cell with each of a plurality of compounds having commonbiological activity and determining a change in the expression of aplurality of genes of said cell as a result of said contacting wherecontacting with each of said plurality of compounds results in the samerelative changes of expression of said genes and thereby identifyingsaid genes as a related gene set.

In addition, the invention specifically contemplates the testing ofcompounds in (b) that were not a known biologically active agents butalso encompasses cases where the agent may have been known to have suchbiological activity in one kind of cell but not others that can betested using the methods herein. In addition, such known, or suspected,biological activity may have been previously determined to involve adifferent molecular mechanism that that utilized by the methods of thepresent invention.

In one highly specific embodiment, the related gene set is a cancerrelated gene set, identified by the modulation of all of its membergenes by a given anti-neoplastic agent. The present invention provides amethod of using this set as a fingerprint for other anti-neoplasticagents by the method comprising the steps of:

(a) exposing a known cancerous cell to a chemical agent to be tested forantineoplastic activity;

(b) allowing said chemical agent to modulate the activity of one or moregenes present in said cell wherein said genes include or comprise acancer related gene set, such as the sequences of SEQ ID NO: 1-12, orsequences substantially identical to said sequences, or the complementsof any of the foregoing;

(c) determining or detecting the expression of one or more genes of step(b);

(d) comparing the expression of said genes in the presence or absence ofexposure to chemical agent;

wherein a difference in expression of all of these genes is indicativeof anti-neoplastic activity.

In specific embodiments, this relates to the genes whose sequencescorrespond to the sequences of SEQ ID NO: 1-12. As used herein, the term“correspond” means that the gene has the indicated nucleotide sequenceor that it encodes substantially the same RNA as would be encoded by theindicated sequence, the term “substantially” meaning about at least 90%identical as defined elsewhere herein and includes splice variantsthereof.

The sequences disclosed herein may be genomic in nature and thusrepresent the sequence of an actual gene, such as a human gene, or maybe a cDNA sequence derived from a messenger RNA (mRNA) and thusrepresent contiguous exonic sequences derived from a correspondinggenomic sequence or they may be wholly synthetic in origin for purposesof practicing the processes of the invention. Because of the processingthat may take place in transforming the initial RNA transcript into thefinal mRNA, the sequences disclosed herein may represent less than thefull genomic sequence. They may also represent sequences derived fromribosomal and transfer RNAs. Consequently, the genes present in the cell(and representing the genomic sequences) and the sequences disclosedherein, which are mostly cDNA sequences, may be identical or may be suchthat the cDNAs contain less than the full genomic sequence. Such genesand cDNA sequences are still considered corresponding sequences becausethey both encode similar RNA sequences. Thus, by way of non-limitingexample only, a gene that encodes an RNA transcript, which is thenprocessed into a shorter mRNA, is deemed to encode both such RNAs andtherefore encodes an RNA complementary to (using the usual Watson-Crickcomplementarity rules), or that would otherwise be encoded by, a cDNA(for example, a sequence as disclosed herein). Thus, the sequencesdisclosed herein correspond to genes contained in the cancerous ornormal cells used to determine relative levels of expression becausethey represent the same sequences or are complementary to RNAs encodedby these genes. Such genes also include different alleles and splicevariants that may occur in the cells used in the processes of theinvention.

The genes of the invention “correspond to” a polynucleotide having asequence of SEQ ID NO: 1-12, if the gene encodes an RNA (processed orunprocessed, including naturally occurring splice variants and alleles)that is at least 90% identical, preferably at least 95% identical, mostpreferably at least 98% identical to, and especially identical.to, anRNA that would be encoded by, or be complementary to, such as byhybridization with, a polynucleotide having the indicated sequence. Inaddition, genes including sequences at least 90% identical to a sequenceselected from SEQ ID NO: 1-12, preferably at least about 95% identicalto such a sequence, more preferably at least about 98% identical to suchsequence and most preferably comprising such sequence are specificallycontemplated by all of the processes of the present invention as beinggenes that correspond to these sequences. In addition, sequencesencoding the same proteins as any of these sequences, regardless of thepercent identity of such sequences, are also specifically contemplatedby any of the methods of the present invention that rely on any or allof said sequences, regardless of how they are otherwise described orlimited. Thus, any such sequences are available for use in carrying outany of the methods disclosed according to the invention. Such sequencesalso include any open reading frames, as defined herein, present withinany of the sequences of SEQ ID NO: 1-12.

In carrying out the foregoing assays, relative biological activity maybe ascertained by the extent to which a given chemical agent modulatesthe expression of genes present in a cell of a particular tissue ororgan, such as where said genes are part of the genome of said cell.Thus, a first chemical agent that modulates the expression of the genesof a related gene set, or some other selected gene set, where biologicalactivity is therapeutic activity, to a larger degree than a secondchemical agent tested by the assays of the invention is thereby deemedto have higher, or more desirable, or more advantageous, therapeuticactivity than said second chemical agent. However, the tested agent isdeemed therapeutic if it modulates the same related gene set (i.e., hasthe same gene expression fingerprint) as an established therapeuticagent, although the extent of such modulation may vary somewhat as toone or more of the genes of said gene set.

The gene expression to be measured is commonly assayed using RNAexpression as an indicator. Thus, the greater the level of RNA(messenger RNA) detected the higher the level of expression of thecorresponding gene. Thus, gene expression, either absolute or relative,such as here where the expression of several different genes is beingquantitatively evaluated and compared in order to establish the geneexpression profile of a test compound, for example, the genes of arelated gene set as disclosed herein, is determined by the relativeexpression of the RNAs encoded by the various gene members of the set.

RNA may be isolated from samples in a variety of ways, including lysisand denaturation with a phenolic solution containing a chaotropic agent(e.g., triazol) followed by isopropanol precipitation, ethanol wash, andresuspension in aqueous solution; or lysis and denaturation followed byisolation on solid support, such as a Qiagen resin and reconstitution inaqueous solution; or lysis and denaturation in non-phenolic, aqueoussolutions followed by enzymatic conversion of RNA to DNA templatecopies.

Normally, prior to applying the processes of the invention, steady stateRNA expression levels for the genes, and sets of genes, disclosed hereinwill have been obtained. It is the steady state level of such expressionthat is affected by potential biologically active agents as determinedherein. Such steady state levels of expression are easily determined byany methods that are sensitive, specific and accurate. Such methodsinclude, but are in no way limited to, real time quantitative polymerasechain reaction (PCR), for example, using a Perkin-Elmer 7700 sequencedetection system with gene specific primer probe combinations asdesigned using any of several commercially available software packages,such as Primer Express software., solid support based hybridizationarray technology using appropriate internal controls for quantitation,including filter, bead, or microchip based arrays, solid support basedhybridization arrays using, for example, chemiluminescent, fluorescent,or electrochemical reaction based detection systems.

The gene expression profiling or fingerprinting of the present inventionis used in the same way as chemical and molecular data to identify thecompounds of the invention. For example, where an establishedbiologically active agent is known to have a particular gene expressionfingerprint as defined herein, the present invention contemplates allchemical agents having said fingerprint, especially where said agent wasnot previously known or suspected of having the established biologicalactivity activity.

If, for example, an average measurement contains a library of some50,000 chemical compounds, and genes within the related gene set definedby modulation by compound A, an established biologically active agent,and the genes of the set consistently change their patterns ofexpression in response to particular chemicals (e.g., 5 of the genesalways change expression in a coordinated way, such as down-regulationof one gene within a group of 10) then it always causes thedown-regulation of the other 9 specific genes as well and with the sameprofile or fingerprint as for compound A, then these compounds areidentified as biologically active agents for further testing forbiological activity, such as in vivo.

The biologically active agents identified by the methods disclosedherein may be useful for therapeutic or research purposes and, when suchis the case, they are commonly used in the form of a composition. Thepharmaceutical compositions useful herein also contain apharmaceutically acceptable carrier, including any suitable diluent orexcipient, which includes any pharmaceutical agent that does not itselfinduce the production of antibodies harmful to the individual receivingthe composition, and which may be administered without undue toxicity.Pharmaceutically acceptable carriers include, but are not limited to,liquids such as water, saline, glycerol and ethanol, and the like,including carriers useful in forming sprays for nasal and otherrespiratory tract delivery or for delivery to the ophthalmic system. Athorough discussion of pharmaceutically acceptable carriers, diluents,and other excipients is presented in REMINGTON'S PHARMACEUTICAL SCIENCES(Mack Pub. Co., N.J. current edition).

The present invention also relates to recombinant cells engineered tocontain intrachromosomally or extrachromosomally one or more genes thattogether form a related gene set as described herein. Such recombinantcells are genetically engineered (transduced or transformed ortransfected) with suitable vectors, which may be, for example, a cloningvector or an expression vector. The vector may be, for example, in theform of a plasmid, a viral particle, a phage, etc. The engineered hostcells can be cultured in conventional nutrient media modified asappropriate for activating promoters, selecting transformants oramplifying the genes of the present invention. The culture conditions,such as temperature, pH and the like, are those previously used with thehost cell selected for expression, and will be apparent to theordinarily skilled artisan.

The appropriate DNA sequence may be inserted into the vector by avariety of procedures. In general, the DNA sequence is inserted into anappropriate restriction endonuclease site(s) by procedures known in theart. Such procedures and others are deemed to be within the scope ofthose skilled in the art.

The DNA sequence in the expression vector is operatively linked to anappropriate expression control sequence(s) (promoter) to direct mRNAsynthesis. As representative examples of such promoters, there may bementioned: LTR or SV40 promoter, the E. coli, lac or trp, the phagelambda P_(L) promoter and other promoters known to control expression ofgenes in prokaryotic or eukaryotic cells or their viruses. Theexpression vector also contains a ribosome binding site for translationinitiation and a transcription terminator. The vector may also includeappropriate sequences for amplifying expression.

In addition, the expression vectors preferably contain one or moreselectable marker genes to provide a phenotypic trait for selection oftransformed host cells such as dihydrofolate reductase or neomycinresistance for eukaryotic cell culture, or such as tetracycline orampicillin resistance in E. coli.

The vector containing the appropriate DNA sequence as hereinabovedescribed, as well as an appropriate promoter or control sequence, maybe employed to transform an appropriate host to permit the host toexpress the protein.

As representative examples of appropriate hosts, there may be mentioned:bacterial cells, such as E. coli, Streptomyces, Salmonella typhimurium;fungal cells, such as yeast; insect cells such as Drosophila S2 andSpodoptera Sf9, animal cells such as CHO, COS or Bowes melanoma;adenoviruses; plant cells, etc. The selection of an appropriate host isdeemed to be within the scope of those skilled in the art from theteachings herein.

Promoter regions can be selected from any desired gene using CAT(chloramphenicol transferase) vectors or other vectors with selectablemarkers. Two appropriate vectors are pKK232-8 and pCM7. Particular namedbacterial promoters include lacd, lacz, T3, T7, gpt, lambda P_(R), P_(L)and trp. Eukaryotic promoters include CMV immediate early, HSV thymidinekinase, early and late SV40, LTRs from retrovirus, and mousemetallothionein-I. Selection of the appropriate vector and promoter iswell within the level of ordinary skill in the art.

In a further embodiment, the present invention relates to host cellscontaining the above-described constructs, such as the genes forming arelated gene set as defined herein. The host cell can be a highereukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell,such as a yeast cell, or the host cell can be a prokaryotic cell, suchas a bacterial cell. Introduction of the construct into the host cellcan be effected by calcium phosphate transfection, DEAE-Dextran mediatedtransfection, or electroporation.

Common methods useful herein are those described in detail in Sambrook,et al., Molecular Cloning: A Laboratory Manual, Second Edition, ColdSpring Harbor, N.Y., (1989), Wu et al, Methods in Gene Biotechnology(CRC Press, New York, N.Y., 1997), and Recombinant Gene ExpressionProtocols, in Methods in Molecular Biology, Vol. 62, (Tuan, ed., HumanaPress, Totowa, N.J., 1997), the disclosures of which are herebyincorporated by reference.

The present invention also relates to a process that comprises a methodfor producing a product, such as by generating test data to facilitateidentification of such product, comprising identifying an agentaccording to one of the disclosed processes for identifying such anagent (i.e., the therapeutic agents identified according to the assayprocedures disclosed herein) wherein said product is the data collectedwith respect to said agent as a result of said identification process,or assay, and wherein said data is sufficient to convey the chemicalcharacter and/or structure and/or properties of said agent. For example,the present invention specifically contemplates a situation whereby auser of an assay of the invention may use the assay to screen forcompounds having the desired enzyme modulating activity and, havingidentified the compound, then conveys that information (i.e.,information as to structure, dosage, etc) to another user who thenutilizes the information to reproduce the agent and administer it fortherapeutic or research purposes according to the invention. Forexample, the user of the assay (user 1) may screen a number of testcompounds without knowing the structure or identity of the compounds(such as where a number of code numbers are used the first user issimply given samples labeled with said code numbers) and, afterperforming the screening process, using one or more assay processes ofthe present invention, then imparts to a second user (user 2), verballyor in writing or some equivalent fashion, sufficient information toidentify the compounds having a particular modulating activity (forexample, the code number with the corresponding results). Thistransmission of information from user 1 to user 2 is specificallycontemplated by the present invention.

In accordance with the foregoing, the present invention relates to amethod for producing test data with respect to the biological activityof a compound comprising:

(a) contacting a cell with each of a plurality of compounds exhibitingsimilar biological activity and determining a change in the expressionof a plurality of genes of said cell as a result of said contactingwhereby the relative changes in expression of said genes together formsa gene expression profile;

(b) contacting a compound different from that of (a) determined genes of(a) and determining a change in expression of said determined genes as aresult of said contacting whereby the relative changes in expression ofsaid determined genes together forms the gene expression profile of (a)thereby identifying a biologically active compound; and

(c) producing test data with respect to the gene modulating activity ofsaid compound based on the gene expression profile indicating biologicalactivity.

In one embodiment of the invention, a compound profiler is generated bythe following method:

-   -   1) Sort the reference compounds into groups based on supervised        and/or unsupervised means:        -   a. Supervised method: base upon the similar mechanisms of            action or biology function of related reference compounds            reported in the literature.        -   b. Unsupervised method: based on the similarity of the gene            profiles (gene clustering or Fishing) or biological function            profiles (go profile).        -   c. Combination of the Supervised and Unsupervised method:            refined the groups generated by the unsupervised method with            the supervised method.    -   2) Generate compound classifier by identifying a set of genes        that changes significantly from a cell line or across several        cell lines for a given compound or series of compounds        (analogs):        -   a. Core genes from single cell type, refined by time and/or            dose series experiment.        -   b. Core genes from multiple cell types, refined by time            and/or dose series experiment.        -   c. Core genes that can be used to separate one compound bin            from the other bins in the reference database.        -   d. Identify a common set of core genes as the compound            classifier    -   3) Combine the compound classifiers from a given reference        compound database into a compound profile:        -   a. Compute the similarity matrixes of the gene profiles of            an unknown compound against the reference compound            classifier.        -   b. Plot all the similarity matrix scores of the unknown            compounds against the classifiers to generate the profile            map.        -   c. Compare the profile maps between the unknown compound and            the reference compounds.

In accordance with the present invention, compound profiles weregenerated based upon reported mechanisms of action (MOAS) for referencecompounds. However, one could attempt to establish predictive compoundprofiles based around other compound groupings, including those based onsuch properties as toxicity, selectivity, specific molecular targetsand/or in vivo efficacy.

In one such example, a compound profile was generated by the followingalgorithm:

-   1. Sort reference compounds into groups based upon MOA reported in    literature.-   2. Treat cells at 5×IC₅₀, IC₉₀, or 40 mM (maximum concentration).    Isolate RNA at specific time points following treatment and perform    METS analysis.-   3. Check the gene expression profiles within each MOA group and, if    necessary, divide the compounds into subclasses (e.g. DNA synthesis    inhibitors).-   4. Search the genes that uniquely show significantly decreased or    increased expression levels for each MOA grouping.-   5. Test and validate with test data set.-   6. Compare relative gene expression levels of the test sample to the    expected gene expression levels of the classifiers.-   7. Plot the Pearson correlation coefficient of each classifier    against the compound treatment to generate a MOA profile map.-   8. Compute the similarity matrix of the MOA profile maps between the    unknown compound to define the unknown compound groups.-   9. Classify the property of the biological function of the unknown    based on the similarity of the MOA profiles between the reference    compounds and the unknowns.

Alternative embodiments of the invention include finding the genes thatalways move in each compound treatment within a class, comparing thecompound treatments for a class to all other compounds and determinewhat distinguishes them, and/or generate multiclass comparisons wheregenes are identified that are unique for each class

In one such application, for every compound in the data set the genesthat changed expression level upon chemical treatment is identified.This can be done by comparing the expression level to the expressionlevel of a negative control compound treatment. This is typicallyvehicle treated cells, but could be an inactive compound or notreatment. Compounds are divided into classes based on known mechanismsof action, and unsupervised expression analysis to identify compoundswhich act in a similar manner.

In accordance with the foregoing, all compounds in a class are analyzedto identify single genes or combinations of genes that behave in similarway in all samples within the class. This identifies a signature fromthe class, but makes no assumptions about it being unique compared toother classes. Compare the expression of compound treatments with aclass to all other treatments in the database. This allows theidentification of patterns of genes that uniquely define each class fromall others. This gives what is unique to each class but is dependent ofthe completeness of the database. A multiclass discriminator wouldidentify those gene sets that are unique to each class and don't overlapanother class.

Such comparisons can be done in several ways:

-   -   1) T-test based analysis where each gene is tested for        difference in the populations    -   2) Nonparametric approaches like SAM, where no assumtions are        made about distributions as in 1    -   3) Baysian approaches which use the probabilities of the        differences    -   4) Combinatorial gene changes were sets of genes are used        together, such as “if gene A and Gene B changes it is class 1,        but if only one changes it is class 2.”

In one such example, nonlinear associations include cases such as wherea given gene is high or low and is placed in class 1, while no changeresults in class 2.

It should be cautioned that, in carrying out the procedures of thepresent invention as disclosed herein, any reference to particularbuffers, media, reagents, cells, culture conditions and the like are notintended to be limiting, but are to be read so as to include all relatedmaterials that one of ordinary skill in the art would recognize as beingof interest or value in the particular context in which that discussionis presented. For example, it is often possible to substitute one buffersystem or culture medium for another and still achieve similar, if notidentical, results. Those of skill in the art will have sufficientknowledge of such systems and methodologies so as to be able, withoutundue experimentation, to make such substitutions as will optimallyserve their purposes in using the methods and procedures disclosedherein.

The present invention will now be further described by way of thefollowing non-limiting example. In applying the disclosure of theexample, it should be kept clearly in mind that other and differentembodiments of the methods disclosed according to the present inventionwill no doubt suggest themselves to those of skill in the relevant art.The following example shows how a potential anti-neoplastic agent may beidentified using one or more of the genes disclosed herein.

EXAMPLE

HT29 cells are grown to a density of 10⁵ cells/cm² in Leibovitz's L-15medium supplemented with 2 mM L-glutamine (90%) and 10% fetal bovineserum. The cells are collected after treatment with 0.25% trypsin, 0.02%EDTA at 37° C. for 2 to 5 minutes. The trypsinized cells are thendiluted with 30 ml growth medium and plated at a density of 50,000 cellsper well in a 96 well plate (200 μl/well). The following day, cells aretreated with either compound buffer alone, or compound buffer containinga chemical agent to be tested, for 24 hours. The media is then removed,the cells lysed and the RNA recovered using the RNAeasy reagents andprotocol obtained from Qiagen. RNA is quantitated and 10 ng of sample in1 μl are added to 24 μl of Taqman reaction mix containing 1× PCR buffer,RNAsin, reverse transcriptase, nucleoside triphosphates, amplitaq gold,Tween 20, glycerol, bovine serum albumin (BSA) and specific PCR primersand probes for a reference gene (18S RNA) and a test gene (Gene X).Reverse transcription is then carried out at 48° C. for 30 minutes. Thesample is then applied to a Perkin Elmer 7700 sequence detector and heatdenatured for 10 minutes at 95° C. Amplification is performed through 40cycles using 15 seconds annealing at 60° C. followed by a 60 secondextension at 72° C. and 30 second denaturation at 95° C. Data files arethen captured and the data analyzed with the appropriate baselinewindows and thresholds.

The quantitative difference between the target and reference genes isthen calculated and a relative expression value determined for all ofthe samples used. This procedure is then repeated for each of the targetgenes in a given signature, or characteristic, set and the relativeexpression ratios for each pair of genes is determined (i.e., a ratio ofexpression is determined for each target gene versus each of the othergenes for which expression is measured, where each gene's absoluteexpression is determined relative to the reference gene for eachcompound, or chemical agent, to be screened). The samples are thenscored and ranked according to the degree of alteration of theexpression profile in the treated samples relative to the control. Theoverall expression of the set of genes relative to the controls, asmodulated by one chemical agent relative to another, is alsoascertained. Chemical agents having the most effect on a given gene, orset of genes, are considered the most anti-neoplastic.

1. A method for identifying a compound with a selected activity, comprising: (a) determining a change in the expression profile of a selected set of genes in the presence and absence of a first compound having a selected activity, (b) determining a change in the expression profile of the selected set of genes of step (a) in the presence and absence of a second compound, (c) comparing said determined change in expression profile in step (b) with that in step (a) wherein a determination in step (c) of the same or similar change in said expression profile identifies said second compound as a compound having said selected activity.
 2. The method of claim 1 wherein said selected activity is antineoplastic activity.
 3. The method of claim 1 wherein said selected set of genes is present in a cell.
 4. The method of claim 1 wherein said expression is transcription.
 5. The method of claim 1 wherein said change in expression profile is determined by determining synthesis of RNA.
 6. The method of claim 1 wherein said change in expression profile is determined by determining polypeptide synthesis.
 7. The method of claim 1 wherein said selected activity is inducing a physiological change in a cell.
 8. The method of claim 1 wherein said selected activity is therapeutic activity.
 9. The method of claim 1 wherein said selected activity is enzyme inhibitory activity.
 10. The method of claim 1 wherein the compound of step (a) is a topoisomerase II inhibitor.
 11. The method of claim 1 wherein the compound of step (a) is a member selected from the group consisting of Camptothecine (S, +), beta-Lapachone, Suramin sodium salt, Aclacinomycin A from Streptomyces galilaeus, Mitoxantrone dihydrochloride, Etoposide, Doxorubicin hydrochloride, Aurintricarboxylic acid, Epirubicin hydrochloride, and m-AMSA hydrochloride.
 12. The method of claim 3 wherein the cell is a colon cell.
 13. The method of claim 12 wherein said cell is a cancer cell.
 14. The method of claim 3 wherein the cell is a recombinant cell engineered to express said selected set of genes.
 15. The method of claim 14 wherein said recombinant cell does not express said selected set of genes absent said engineering.
 16. The method of claim 3 wherein said selected set of genes is part of the genome of said cell.
 17. A related gene set comprising genes whose polynucleotide sequences correspond to the sequences of SEQ ID NO: 1-12.
 18. The method of claim 1 wherein said determined genes are in the gene set of claim
 17. 19. A compound identified as having therapeutic activity by the method of claim
 1. 20. A compound identified as having anti-neoplastic activity by the method of claim
 1. 21. A compound identified as having enzyme inhibitory activity by the method of claim
 1. 22. A method for treating a disease comprising administering to an animal afflicted with said disease of a therapeutically effective amount of the compound of claim
 19. 23. A method for treating cancer comprising administering to an animal afflicted with cancer of a therapeutically effective amount of the compound of claim
 20. 24. A method for identifying a related gene set comprising: contacting a cell with each of a plurality of compounds having common biological activity and determining a change in the expression of a plurality of genes of said cell as a result of said contacting where contacting with each of said plurality of compounds results in the same relative changes of expression of said genes and thereby identifying said genes as a related gene set.
 25. The method of claim 24 wherein said biological activity is therapeutic activity.
 26. The method of claim 24 wherein said biological activity is enzyme inhibitory activity.
 27. The method of claim 24 wherein said biological activity is anti-neoplastic activity.
 28. The method of claim 24 wherein said plurality of compounds are topoisomerase II inhibitors.
 29. The method of claim 24 wherein said plurality of compounds comprise members selected from the group consisting of Camptothecine (S, +), beta-Lapachone, Suramin sodium salt, Aclacinomycin A from Streptomyces galilaeus, Mitoxantrone dihydrochloride, Etoposide, Doxorubicin hydrochloride, Aurintricarboxylic acid, Epirubicin hydrochloride, and m-AMSA hydrochloride.
 30. A method for producing test data with respect to a biological activity of a compound comprising: (a) contacting a cell with each of a plurality of compounds exhibiting similar biological activity and determining a change in the expression of a plurality of genes of said cell as a result of said contacting whereby the relative changes in expression of said genes together forms a gene expression profile; (b) contacting a compound different from that of (a) with the determined genes of (a) and determining a change in expression of said determined genes as a result of said contacting whereby the relative changes in expression of said determined genes together forms the gene expression profile of (a) thereby identifying a biologically active compound; and (c) producing test data with respect to the gene modulating activity of said compound based on the gene expression profile indicating biological activity.
 31. A recombinant cell expressing a related gene set identified by the method of claim
 24. 32. A recombinant cell expressing the related gene set of claim
 17. 